Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields
نویسندگان
چکیده
The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian. We combine well-established features used for other languages with language-specific lexical, syntactic and morphological information. In particular, we make use of the rich tagset annotation of the BulTreeBank (680 morpho-syntactic tags), from which we derive suitable task-specific tagsets (local and nonlocal). We further add domain-specific gazetteers and additional unlabeled data, achieving F1=89.4%, which is comparable to the state-of-the-art results for English.
منابع مشابه
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملTwo Step Chinese Named Entity Recognition Based on Conditional Random Fields Models
This paper mainly describes a Chinese named entity recognition (NER) system NER@ISCAS, which integrates text, partof-speech and a small-vocabularycharacter-lists feature and heristic postprocess rules for MSRA NER open track under the framework of Conditional Random Fields (CRFs) model.
متن کاملBiomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets
As the wealth of biomedical knowledge in the form of literature increases, there is a rising need for effective natural language processing tools to assist in organizing, curating, and retrieving this information. To that end, named entity recognition (the task of identifying words and phrases in free text that belong to certain classes of interest) is an important first step for many of these ...
متن کاملBiomedical and Chemical Named Entity Recognition with Conditional Random Fields: The Advantage of Dictionary Features
We present our work on Chemical and Biomedical Named Entity Recognition (NER) using Machine Learning algorithms with different feature sets. It will be demonstrated, that the best results could be obtained using Conditional Random Fields. Furthermore we show the advantage of dictionary based features in this context. All results are obtained with the benchmark settings of the Joint Workshop on ...
متن کاملFeature Subset Selection in Conditional Random Fields for Named Entity Recognition
In the application of Conditional Random Fields (CRF), a huge number of features is typically taken into account. These models can deal with interdependent and correlated data with an enormous complexity. The application of feature subset selection is important to improve performance, speed and explainability. We present and compare filtering methods using information gain or χ 2 as well as an ...
متن کامل